Demystifying AI of Your Own

Adopting RAG AI apps — Spoiler: Your Data is Still Your Own

Artificial intelligence would be the ultimate version of Google. The ultimate search engine that would understand everything on the web. It would understand exactly what you wanted, and it would give you the right thing. We’re nowhere near doing that now. However, we can get incrementally closer to that, and that is basically what we work on. — Larry Page

Artificial Intelligence (AI) has revolutionized numerous industries, and 2024 is shaping up to be a pivotal year for Retrieval Augmented Generation (RAG) due to its rapid adoption and widespread discussion. In this article, I aim to explain the significance of RAG and provide a brief tutorial on how to implement RAG using a local Large Language Model (LLM), ensuring your data remains secure. LLMs, like GPT-4, have revolutionized many applications, from customer service chatbots to complex data analysis. Yet, their limitations highlight the need for advancements like RAG.

Why RAG?

To understand RAG, it’s crucial to grasp how AI, particularly Large Language Models (LLMs), are trained and the limitations they face. Machine Learning, a discipline that has been evolving for decades, encompasses AI as a subset. AI’s recent focus has been on LLMs due to their remarkable ability to extract patterns from data and predict answers to input questions. These models use context to find the best fit for a query; however, if they lack sufficient context, they may confidently provide incorrect answers. This phenomenon occurs because LLMs are trained on a fixed dataset. Once an LLM is released, it stops training, which means it doesn’t stay updated with current events and might not provide specialized responses. When LLMs generate inaccurate or nonsensical answers with undue confidence, this is known as a hallucination.

Hallucinations may be harmless and amusing when we’re testing how a Chat Agent will respond, but if you’re trying to quickly understand a new team proposal or a law under consideration, incorrect details from an LLM can harm your reputation. This is where Retrieval Augmented Generation (RAG) comes in to mitigate the hallucination issues inherent in LLMs. Retraining an LLM can be both time-consuming and expensive, but RAG offers a more efficient solution.

RAG is a process where an LLM consumes documents and files as context to help answer queries with up-to-date data and provide more specialized responses. For example, Gemini won’t know your Q3 sales information if it isn’t publicly available and included in its training data. RAG addresses hallucinations by enriching your query with relevant, current context.

I Know ChatGPT but I Am Scared To Give It My Data

Every Business is a Software Business — Watts S. Humphrey

If you’re not paying for the product, then you’re the product — The Social Dilemma.

It’s understandable to be cautious about freely sharing data, as it drives our business decision-making and distinguishes our business. Data is the product nowadays; it’s the story and presentation we craft that drive sales. One common misconception is that any LLM will automatically understand your data and take control out of your hands.

However, cloud providers like GCP, AWS, and Azure offer base models and the capability to bring your own models, running them within your virtual private cloud (VPC). This approach allows you to continue training your models and ensures that your data stays secure and private, without exposure to unauthorized parties. Implementing a solution with an API Gateway can further safeguard your data. By leveraging these tools, your business can harness the power of LLMs without compromising data security.

RAGing on Your Setup

In this tutorial, we will be building your own RAG agent. The code is available at my GitHub: https://github.com/joshbrgs/medium-tutorials. This project consists of a model created with Python, LangChain, and Chroma, which will be provided via a REST API. It’s okay if you don’t fully understand this tech stack; I will walk you through the model part below.

You will need:

Python

Getting and Testing an LLM

Large Language Models (LLMs) have been such a hot-button topic lately, and most people do not know that some of these models are open-source software allowing people to modify and experiment with them. Models are trained and deployed to Model Zoos. Popular Zoos are Hugging Face and Ollama. I like both, but in this article, we will be using Ollama.

Visit https://ollama.com/ and click download
Unzip the download and click on the Ollama.app
Follow the prompts to install the cli
Now you ollama run llama3
If you see >>>, then you can type whatever to prompt your new local LLM!
Let’s try “How do you calculate a company’s valuation?” This should give you a bunch of information regarding this topic :P

If you were to ask it for my EBITDA, it would give information on how to calculate it and where to look for the numbers but need help to do it for you because there is no way it knows what your documents look like. Let’s start using this LLM in code!

git clone https://github.com/joshbrgs/medium-articles.git
cd demystifying-ai
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

If you would like, I have a file that is already set up with the functionality needed, but if you want to experiment on your own you can follow along step by step.

from langchain_ollama import ChatOllama

llm = ChatOllama(model="llama3", tempurature=0)
res = llm.invoke("What is my EBITDA?")
print(res)

Your Documents, splitting, and Encoding Them

The key aspect of an RAG application is to provide your documents as context to the LLM, ensuring that the responses are personalized to you. Documents can be large and difficult to manage, especially when linking multiple ones together. This is where splitting and encoding come into play. Splitting involves breaking a document into smaller pieces of text, while encoding is the process of converting a sequence of characters into a specialized format. Think of it like coordinates for an LLM (e.g., Sonic the Hedgehog might be closely related to Delta Sonic Carwash because of the word “Sonic”). Most LLMs have a specific encoding mechanism, so OpenAI’s method will differ from Llama3’s.

In our example, I will be using financial statements from Robinhood’s 2023 annual report, if you would like to follow along you can download them from here. There is a lot of information in this pdf. One could modify the application to allow users to upload their documentation or retrieve it from a Database, or an instance of Office 365 if you feel ambitious.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

file_path = "./ey-eu-ai-act-political-agreement-overview-february-2024.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

print(len(docs))

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print(len(splits))

Setting Up a Vector Database

We were able to split up the file into smaller parseable chunks that are embedded, but to not need to do this step over and over, we can make sure to store the embeddings in a vector database. This enables the retrieval of information needed to answer your question. We will be using an open-source vector database called Chroma.

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

file_path = "./ey-eu-ai-act-political-agreement-overview-february-2024.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

embeddings = OllamaEmbeddings(model="llama3")

vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

retriever = vectorstore.as_retriever()

# query it
query = "What is the 2023 net Profit"
search = vectorstore.similarity_search(query)

# print results
print(search[0].page_content)
print(search[0])

Testing Out Our RAG Assistant

Now we can use this model to ask detailed questions about The European Artificial Intelligence Act document by EY. This information is distilled and questions are answered easily from our very own documents without the help of a Google Search, this LLM is all local and will work without the internet. I have taken the liberty to refactor our above code into something a little more usable and understandable. You can now call it with the first argument being the document of choice and second, your prompt! In the repository, I have broken these out into further functions. You can find these files in the git repo.

from pdf_loader import PDFLoader
from document_splitter import DocumentSplitter
from embeddings_generator import EmbeddingsGenerator
from rag_agent import RAGAgent
from utils import create_prompt_template
from langchain_ollama import ChatOllama

def main():
    file_path = input("Please enter the path to the PDF file: ")
    user_prompt = input("Please enter your prompt for the LLM: ")

    pdf_loader = PDFLoader(file_path)
    documents = pdf_loader.load_documents()

    splitter = DocumentSplitter()
    splits = splitter.split_documents(documents)

    embeddings_generator = EmbeddingsGenerator()
    retriever = embeddings_generator.generate_embeddings(splits)

    llm = Ollama(model="llama3")
    prompt_template = create_prompt_template()

    rag_agent = RAGAgent(retriever, llm, prompt_template)
    results = rag_agent.get_answer(user_prompt)

    print(results["answer"])

    for document in results["context"]:
        print(document)
        print()

if __name__ == "__main__":
    main()

from langchain_community.document_loaders import PyPDFLoader

class PDFLoader:
    def __init__(self, file_path):
        self.file_path = file_path

    def load_documents(self):
        loader = PyPDFLoader(self.file_path)
        return loader.load()

from langchain_text_splitters import RecursiveCharacterTextSplitter

class DocumentSplitter:
    def __init__(self, chunk_size=1000, chunk_overlap=200):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def split_documents(self, documents):
        text_splitter = RecursiveCharacterTextSplitter(chunk_size=self.chunk_size, chunk_overlap=self.chunk_overlap)
        return text_splitter.split_documents(documents)

from langchain_chroma import Chroma
from langchain_ollama import OllamaEmbeddings

class EmbeddingsGenerator:
    def __init__(self, model="llama3"):
        self.model = model

    def generate_embeddings(self, documents):
        embeddings = OllamaEmbeddings(model=self.model)
        vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings)
        return vectorstore.as_retriever()

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

class RAGAgent:
    def __init__(self, retriever, llm, prompt_template):
        self.retriever = retriever
        self.llm = llm
        self.prompt_template = prompt_template

    def create_rag_chain(self):
        question_answer_chain = create_stuff_documents_chain(self.llm, self.prompt_template)
        return create_retrieval_chain(self.retriever, question_answer_chain)

    def get_answer(self, prompt):
        rag_chain = self.create_rag_chain()
        return rag_chain.invoke({"input": prompt})

from langchain_core.prompts import ChatPromptTemplate

def create_prompt_template():
    system_prompt = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        "{context}"
    )
    return ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{input}"),
        ]
    )

python3 main.py

If you enjoyed this tutorial and building this RAG application, I highly suggest looking at LangChain documentation and How To Guides. They are full of rich content and go more in-depth than I do in my quick tutorial. You can add to this by creating a UI where a user can prompt the Model, upload their documentation, and more!

Conclusion

RAG applications can create new business opportunities and enhance your products and understanding of your data. They can generate new material based on your existing content, create personalized chat agents, vet support issues, and distill information for your C-suite. With cloud providers offering hosting solutions for your LLMs, more businesses can adopt this innovative technology and set themselves apart from the competition.

References

https://chatgpt.com/